First off, I'll assume that your dependencies are listed in requirements.txt. To package and zip the dependencies, run the following at the command line:
pip install -t dependencies -r requirements.txt
zip -r ../dependencies.zip .
Above, the cd dependencies command is crucial to ensure that the modules are the in the top level of the zip file. Thanks to Dan Corin's post for heads up.
Next, submit the job via:
spark-submit --py-files dependencies.zip spark_job.py
The --py-files directive sends the zip file to the Spark workers but does not add it to the PYTHONPATH (source of confusion for me). To add the dependencies to the PYTHONPATH to fix the ImportError, add the following line to the Spark job, spark_job.py:
- A JOIN clause is used to combine rows from two or more tables, based on a related column between them.
- Command Used to back up a database in Linux
mysqldump -u root -p edusoho > edusoho.sql
- MySQL Directory Layout
If install from rpm package, the .lib .h and all other type of files are placed by rpm into the corresponding directories
If install by uncompressing a tar, the mysql directory layout will be preserved.
- MySQL Command
mysql > show processlist;
mysql > show status;
Productivity v.s. Complexity
Over engineering happens when the productivity gained does not worth the effort of work and the complexity added into the system.
A concern in software development that aims to avoid software bugs that cause security vulnerability dealing with random-memory-access, such as buffer overflows and dangling pointers.
Type safety is the extent to which a programming language discourages or prevents type errors.
How to Write Multi-threading Code ?
The multi-threading programming of Java is achieved through the use of Thread object.
1. Declare a class to be the subclass of Thread class and overrides its run method.
2. Declare a class and implement the Runnable interface, then implement the run method. (Recommended)
The create of a new thread requires that a Thread object to be created with a Runnable object given as the first argument. To start running a new thread, a thread.start() method is provided. Also, thread.join() method is provided to synchronize states between different threads.
Static class members are shared by all threads (atomic w/r). Keyword: Synchronized is provided to ensure the all threads execute certain methods sequentially.
Class method members (as well as local variables in the method definition) is independent for each threads.
Start a docker container and launch multiple background process (service) through docker run is impossible. The reason is that docker run does not support multiple command given as docker run argument. Multiple command execution is only viable through /bin/bash -c "command1; command2". However, a container is by design exited whenever the /bin/bash process is finished. Because of this mechanism, all service launching command like "python web.py" must not ended with a daemon decorator "&", such as /bin/bash -c "python web1.py & python web2.py &" (In this case, the /bin/bash process finishes so that the container then exited").
Theoretically, a command such as /bin/bash -c "python web1.py & python web2.py" should bypass this problem since /bin/bash process got stuck with the web2.py process running in the foreground (within the container)", but this proved to be wrong in practice due to unknown reasons".
It is worth to mention that commands such as /bin/bash -c "python web1.py; python web2.py" will keep the container running, but the launch of web1.py will prevent web2.py from launching forever (because of the ; decorator).
1. /bin/bash -c "command1; command2" is rarely useful
2. docker run xxx /bin/bash -c "command" can be replaced by docker run xxx command
3. Multiple services should be split into different containers/images (container and service should maintain 1-on-1 mapping)