Python crawler: Find and analyze the transfer api of Baidu cloud
To begin with, the first step is to explore and analyze the transfer API of Baidu Cloud. Before you can proceed, you need to have a Baidu Cloud account and log in. Using a browser like Firefox, open a shared link from Baidu Cloud. Then, press F12 to open the developer console and start capturing network traffic.
Next, manually perform a file dump by selecting all files, choosing "Save to Network Disk," picking a destination folder, and clicking OK. It’s recommended to clear the packet capture before initiating the action, so that the specific transfer API call can be easily identified. This approach is similar to the control variable method we learned in middle school — isolating variables to observe outcomes more clearly.
As shown in the image above, a POST request containing the word "transfer" was captured. This is the API endpoint responsible for transferring files. The next step is to analyze its request headers and parameters to simulate the process programmatically.
Click on the request, then go to the "Cookies" tab on the right side to view the cookies included in the request header. These cookies are essential for maintaining the logged-in session during the transfer process.
Cookie Analysis
Since the file transfer occurs after logging in, it's crucial to simulate the login status by including relevant cookies in the request header. To do this, use the control variable method again — delete all Baidu-related cookies in your browser settings (under Privacy). After deleting them, refresh the page to see if the login is lost. Once the login is no longer active, you know which cookies are necessary for maintaining the session.
By repeating this process, you'll find that two specific cookies — BDUSS and STOKEN — are critical for the transfer operation. If either of these cookies is missing or expired, the transfer will fail and prompt for re-login. Therefore, these two cookies must be included in any automated transfer request.
Now, construct the request header accordingly. Here’s an example of how the headers might look.
In addition to the BDUSS and STOKEN cookies, other request headers can be copied directly from the original request made during manual transfer. However, it's important to note that cookies have expiration dates, so they may need to be refreshed periodically. Also, different accounts will have different cookie values, so each user will need to manage their own set of cookies.
Parameter Analysis
Now, let’s take a closer look at the request parameters. Click on the "Params" tab to view the query string and form data sent with the request.
In the query string, you’ll find several key parameters such as shareid, from, and bdstoken. These need to be filled in manually, while others can be copied directly from the original request. Additionally, the form data includes the file list and the target path where the files will be saved.
To successfully simulate the transfer, you need to provide the following parameters: shareid, from, bdstoken, filelist, and path. While bdstoken and path can be manually set, shareid, from, and filelist must be extracted from the shared link. This process will be explained in detail in the next section.
Constructing the URL for the transfer request looks something like this.
Crawling shareid, from, filelist, and sending the transfer request
Let’s take the example link provided. Even though it may change over time, the structure remains consistent. First, access the link using your browser and open the developer console. Use the search function to look for "shareid" and identify the correct one.
You’ll find multiple instances of "shareid," but only the last one is related to the actual file being shared. Double-clicking it reveals the formatted JavaScript code, where all the necessary information is stored. For instance, the file list is contained within yunData.FILEINFO, which is a JSON object.
The JSON data is encoded in Unicode, so you won’t see Chinese characters in the console. However, when accessed through Python, it will display correctly.
Directly using requests to fetch the data may result in a 404 error, possibly due to missing headers. To avoid this, the author used Selenium WebDriver to retrieve the data twice. The first request retrieves the BAIDUID, and the second one proceeds normally.
The structure of yunData.FILEINFO is as follows. You can copy it into a JSON validator for better readability.
With the correct parameters identified, you can now extract them using regular expressions and use them to call the transfer API. Below is an example of how the code might look.
Once these three parameters are retrieved, you can proceed to use the previously discussed transfer method to complete the file transfer.
This series is a traditional classic LED par light. As the most commonly used dyeing effect light on the stage, the par light is the most used one. There are not only styles suitable for indoor use, but also styles suitable for outdoor waterproofing. The brightness and effect are excellent. It is a very practical Stage Light. This series of lights is one of the most configured lights in the stage lights. It brings brightness and color transformation effects to the stage. It can make the stage effect better when matched with other series of effect lights.
Led Par Lights ,Led Par Can,Led Mini Flat Par Light,Led Par Can Light
Guangzhou Cheng Wen Photoelectric Technology Co., Ltd. , https://www.cwdisplay.com